Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.
The underlying concept is to use randomness to solve problems that might be deterministic in principle.
They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches.
Monte Carlo methods vary, but tend to follow a particular pattern:
Consider a quadrant (circular sector) inscribed in a unit square. Given that the ratio of their areas is Ο/4,the value of Ο can be approximated using a Monte Carlo method:
In the above case, exact value for the average is 42 !
Training the agent to play Blackjack using Monte Carlo with exploring starts.
This means that we should start with random state and random action and later follow policy.
Blackjack naturally starts with random state.
So, the first action by agent is random hit or stick.
And later policy described in above case is followed.
Below is estimate from the first episode. We continue to refine estimates after iterating through many such episodes. And finally get graph as shown.